attention step
- Asia > Macao (0.04)
- North America > Canada (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
fecc3a370a23d13b1cf91ac3c1e1ca92-AuthorFeedback.pdf
R1: Cut down on some sections (3.2.1, 3.2.2 and 3.2.5) to spare space for the qualitative examples. We will revise our paper according to the suggestion in the final version. We added experiments on MS-COCO and Flicker30k using single-head attention, Table 1. R2: The base attention model performs better than up-down and GCN-LSTM. In addition, our experimental results showed that increasing the number of min.
- Asia > Macao (0.04)
- North America > Canada (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Reviews: Adaptively Aligned Image Captioning via Adaptive Attention Time
Although the two techniques have been well explored individually, this is the first work combining it for attention for image captioning. This should make reproducing the results easier. The base attention model already is doing much better than up-down attention and recent methods like GCN-LSTM and so it's not clear where the gains are coming from. It'd be good to see AAT applied to traditional single-head attention instead of multi-head attention to convincingly show that AAT helps. For instance, how does the attention time steps vary with word position in the caption?
Modeling and Output Layers in BiDAF -- an Illustrated Guide with Minions
The output of the aforementioned attention step is a giant matrix called G. G is a 8d-by-T matrix that encodes the Query-aware representations of Context words. G is the input to the modeling layer, which will be the focus of this article. Ok, so I know we've been through a lot of steps in the past three articles. It is extremely easy to get lost in the myriad of symbols and equations, especially considering that the choices of symbols in the BiDAF paper aren't that "user friendly." I mean, do you still remember what each of H, U, Ĥ and Ũ represents?